Computing the Fault Tolerance of Multi-agent Deployment1

نویسندگان

  • Yingqian Zhang
  • Efrat Manisterski
  • Sarit Kraus
  • David Peleg
چکیده

There have been tremendous advances in the last decade in the theory and implementation of massive multi-agent systems. However, one major obstacle to the wider deployment of multi-agent systems (MASs) is their capability of tolerating failures. MASs that are deployed across a network can quickly “go down” due to external factors such as power failures, network outages, malicious attacks, and other system issues. Protection against such unexpected failures that disable a node is critical if agents are to be used as the backbone for real world applications. We concern the way replication can form the basis of one tool (amongst many that are needed) to prevent a MAS from succumbing to failure. By replicating agents, we hope to improve the fault tolerance of a multi-agent system. Fault tolerance and replication techniques have been extensively studied in distributed computing systems, but much less so in the multi-agent systems domain. The faults considered in this paper are those that cause disconnection (or crash) of the nodes in the network where the MAS application resides. The fault model that we consider is one where the failure of each node in the network is represented by a probability. Given such a fault model, agents that locate on the nodes have different probabilities to be unavailable, and therefore the multi-agent system as a whole has some probability of being out of function. The idea of using replication as a fault tolerance method in our work is thus that, when facing failures, at least one copy of each agent will continue to reside on a connected, working host computer (node), so that the MAS as a whole can function as a unified application. Furthermore, in this paper, we focus on the problem of measuring the probability that a multi-agent system will tolerate the node failure. We call this probability the survivability of a MAS system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Fault Tolerance Multi Agent co-ordination: A petri net based approach

As technology shifts from centralized computing to distributed computing and then to ubiquitous computing, the users are more dependent on the computer system for task delegation. Here autonomous agent and Multi Agent System (MAS) plays an important role to perform the task delegated by the user. As the fault in MAS is not-deterministic in nature, so designing fault tolerant MAS is a challengin...

متن کامل

ارائه یک رویکرد همانند سازی شده عامل محور در اجرای یک الگوی کد متحرک مطمئن

Abstract Using mobile agents, it is possible to bring the code close to the resources, which is not foreseen by the traditional client/server paradigm. Compared to the client/server computing paradigm, the greater flexibility of the mobile agent paradigm comes at additional costs as well as the additional complexity of developing and managing mobile agent-based applications. Such complexity ...

متن کامل

A Multi-level Method for Criticality Evaluation to Provide Fault Tolerance in Multi-agent Systems

The possibility of failure is a fundamental characteristic of distributed applications. The research community in fault tolerance has developed several solutions mainly based on the concept of replication. In this paper, we propose a fault tolerant hybrid approach in multi-agent systems. We have based our strategy on two main concepts: replication and teamwork. Through this work, we have to cal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009